198,977 research outputs found
Dimensionality Reduction Mappings
A wealth of powerful dimensionality reduction methods has been established which can be used for data visualization and preprocessing. These are accompanied by formal evaluation schemes, which allow a quantitative evaluation along general principles and which even lead to further visualization schemes based on these objectives. Most methods, however, provide a mapping of a priorly given finite set of points only, requiring additional steps for out-of-sample extensions. We propose a general view on dimensionality reduction based on the concept of cost functions, and, based on this general principle, extend dimensionality reduction to explicit mappings of the data manifold. This offers simple out-of-sample extensions. Further, it opens a way towards a theory of data visualization taking the perspective of its generalization ability to new data points. We demonstrate the approach based on a simple global linear mapping as well as prototype-based local linear mappings.
Spectral Dimensionality Reduction
In this paper, we study and put under a common framework a number of non-linear dimensionality reduction methods, such as Locally Linear Embedding, Isomap, Laplacian Eigenmaps and kernel PCA, which are based on performing an eigen-decomposition (hence the name 'spectral'). That framework also includes classical methods such as PCA and metric multidimensional scaling (MDS). It also includes the data transformation step used in spectral clustering. We show that in all of these cases the learning algorithm estimates the principal eigenfunctions of an operator that depends on the unknown data density and on a kernel that is not necessarily positive semi-definite. This helps to generalize some of these algorithms so as to predict an embedding for out-of-sample examples without having to retrain the model. It also makes it more transparent what these algorithm are minimizing on the empirical data and gives a corresponding notion of generalization error. Dans cet article, nous étudions et développons un cadre unifié pour un certain nombre de méthodes non linéaires de réduction de dimensionalité, telles que LLE, Isomap, LE (Laplacian Eigenmap) et ACP à noyaux, qui font de la décomposition en valeurs propres (d'où le nom "spectral"). Ce cadre inclut également des méthodes classiques telles que l'ACP et l'échelonnage multidimensionnel métrique (MDS). Il inclut aussi l'étape de transformation de données utilisée dans l'agrégation spectrale. Nous montrons que, dans tous les cas, l'algorithme d'apprentissage estime les fonctions propres principales d'un opérateur qui dépend de la densité inconnue de données et d'un noyau qui n'est pas nécessairement positif semi-défini. Ce cadre aide à généraliser certains modèles pour prédire les coordonnées des exemples hors-échantillons sans avoir à réentraîner le modèle. Il aide également à rendre plus transparent ce que ces algorithmes minimisent sur les données empiriques et donne une notion correspondante d'erreur de généralisation.non-parametric models, non-linear dimensionality reduction, kernel models, modèles non paramétriques, réduction de dimensionalité non linéaire, modèles à noyau
Isometric sketching of any set via the Restricted Isometry Property
In this paper we show that for the purposes of dimensionality reduction
certain class of structured random matrices behave similarly to random Gaussian
matrices. This class includes several matrices for which matrix-vector multiply
can be computed in log-linear time, providing efficient dimensionality
reduction of general sets. In particular, we show that using such matrices any
set from high dimensions can be embedded into lower dimensions with near
optimal distortion. We obtain our results by connecting dimensionality
reduction of any set to dimensionality reduction of sparse vectors via a
chaining argument.Comment: 17 page
Dimensionality reduction with image data
A common objective in image analysis is dimensionality reduction. The most common often used data-exploratory technique with this objective is principal component analysis. We propose a new method based on the projection of the images as matrices after a Procrustes rotation and show that it leads to a better reconstruction of images
Spectral dimensionality reduction for HMMs
Hidden Markov Models (HMMs) can be accurately approximated using
co-occurrence frequencies of pairs and triples of observations by using a fast
spectral method in contrast to the usual slow methods like EM or Gibbs
sampling. We provide a new spectral method which significantly reduces the
number of model parameters that need to be estimated, and generates a sample
complexity that does not depend on the size of the observation vocabulary. We
present an elementary proof giving bounds on the relative accuracy of
probability estimates from our model. (Correlaries show our bounds can be
weakened to provide either L1 bounds or KL bounds which provide easier direct
comparisons to previous work.) Our theorem uses conditions that are checkable
from the data, instead of putting conditions on the unobservable Markov
transition matrix
Non-Redundant Spectral Dimensionality Reduction
Spectral dimensionality reduction algorithms are widely used in numerous
domains, including for recognition, segmentation, tracking and visualization.
However, despite their popularity, these algorithms suffer from a major
limitation known as the "repeated Eigen-directions" phenomenon. That is, many
of the embedding coordinates they produce typically capture the same direction
along the data manifold. This leads to redundant and inefficient
representations that do not reveal the true intrinsic dimensionality of the
data. In this paper, we propose a general method for avoiding redundancy in
spectral algorithms. Our approach relies on replacing the orthogonality
constraints underlying those methods by unpredictability constraints.
Specifically, we require that each embedding coordinate be unpredictable (in
the statistical sense) from all previous ones. We prove that these constraints
necessarily prevent redundancy, and provide a simple technique to incorporate
them into existing methods. As we illustrate on challenging high-dimensional
scenarios, our approach produces significantly more informative and compact
representations, which improve visualization and classification tasks
- …